Parallel Weighted Random Sampling
نویسندگان
چکیده
Data structures for efficient sampling from a set of weighted items are an important building block many applications. However, few parallel solutions known. We close these gaps. give efficient, fast, and practicable distributed algorithms data that support single (alias tables, compressed structures). This also yields simplified more space-efficient sequential algorithm alias table construction. Our approaches to k out n with/without replacement subset (Poisson) output-sensitive , i.e., the use work linear in number different samples. is interesting case. Weighted random permutation can be done by sorting appropriate deviates. show this possible with work. Finally, we communication-efficient, highly scalable approach (weighted unweighted) reservoir sampling. based on fully model streaming might independent interest. Experiments tables near speedups using up 158 threads shared-memory machines. An experimental evaluation 5,120 cores shows good speedups.
منابع مشابه
Accelerating weighted random sampling without replacement
Random sampling from discrete populations is one of the basic primitives in statistical computing. This article briefly introduces weighted and unweighted sampling with and without replacement. The case of weighted sampling without replacement appears to be most difficult to implement efficiently, which might be one reason why the R implementation performs slowly for large problem sizes. This p...
متن کاملWeighted Random Sampling over Data Streams
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2,4]), discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
متن کاملWeighted Random Sampling (2005; Efraimidis, Spirakis)
The problem of random sampling without replacement (RS) calls for the selection of m distinct random items out of a population of size n. If all items have the same probability to be selected, the problem is known as uniform RS. Uniform random sampling in one pass is discussed in [1, 5, 10]. Reservoir-type uniform sampling algorithms over data streams are discussed in [11]. A parallel uniform r...
متن کاملRandom Sampling Techniques in Parallel Computation
Random sampling is an important tool in the design of parallel algorithms. Using random sampling it is possible to obtain simple parallel algorithms which are e cient in practice. We will focus on the use of random sampling in fundamental problems such as sorting, selection, list ranking and graph connectivity.
متن کاملSlice sampling normalized kernel-weighted completely random measure mixture models
A number of dependent nonparametric processes have been proposed to model non-stationary data with unknown latent dimensionality. However, the inference algorithms are often slow and unwieldy, and are in general highly specific to a given model formulation. In this paper, we describe a large class of dependent nonparametric processes, including several existing models, and present a slice sampl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Mathematical Software
سال: 2022
ISSN: ['0098-3500', '1557-7295']
DOI: https://doi.org/10.1145/3549934